Utilizing Haver Analytic Data in R

In this tutorial, we will explore the integration and utilization of Haver Analytics data within R. This is done utilizing the Haver Library, instead of IMF Datatools, which we previously explored.

Pre-requisites

Loading additional packages

To begin, we need to ensure our Haver package is installed and correctly loaded into our environment. The package should have already been installed during the set up portion at the beginning of the book, but if you are jumping to this section, use the following command in your R console to install the Haver package.

install.packages("Haver")

Next, use this command to load the Haver package.

library(Haver)
Using DLXDB environment variable for setting the Haver path.
Haver path set to: \\imfdata\econ\DATA\DLX\DATA\
Restoring default Haver query limits.

Before accessing any Haver databases, we need to configure the path to our Haver data directory. This step is crucial for R to locate and interact with the data.

haver.path("\\\\imfdata\\econ\\data\\dlx\\data\\")
Haver path set to: \\imfdata\econ\DATA\DLX\DATA\

You should now be set up to use Haver data directly in R.

Accessing Haver Analytics Data

Retrieving Data from Haver

The haver.data function allows us to retrieve data from a specific Haver database. Let’s look at retrieving annual debt to GDP data for different countries from the “EMERGLA” database and view the tail end of the data.

#Input the desired series codes
vars <- c('A312FDPG','A311FDPG','A213FDP','A223FDP','A321FDPG','A243FDP')

#Specify the database and frequency
mydata <- haver.data(codes = vars, database = "emergela", frequency = "annual")

#View the tail end of the data
tail(mydata)
     a312fdpg a311fdpg a213fdp a223fdp a321fdpg a243fdp
2019    48.49    74.57   89.60   74.44    78.04    40.4
2020    68.29    96.89  103.66   86.94   109.10    56.6
2021    55.13    90.13   80.65   77.31   109.19    50.4
2022    33.52    79.56   84.95   71.68   105.09    45.5
2023       NA       NA  156.27   73.83       NA    45.1
2024       NA       NA   83.07   76.50       NA    46.3

You can also retrieve an entire Haver database, instead of desired series codes, but this takes quite a bit of time to load. This option is provided to you below, but it is an optional command.

hv_metadata_df <- as.data.frame(haver.metadata( database = "USECON" ))

Retrieving Metadata from Haver

It is also important to understand the metadata of the variables we are working with.

Let’s use the haver.metadata function to retrieve this information and also list the description of the variables we are using to double check this is what we want.

mymeta <- haver.metadata(codes = vars, database = "emergela")
mymetadf <- as.data.frame(mymeta)
mydesc <- mymetadf[, c("code", "descriptor")]
tail(mydesc)
      code                                                  descriptor
1 a312fdpg                     Anguilla: Public Sector Debt to GDP (%)
2 a311fdpg          Antigua and Barbuda: Public Sector Debt to GDP (%)
3  a213fdp              Argentina: Gross Public Debt as a % of GDP (%)
4  a223fdp     Brazil: Gross General Government Debt as a % of GDP (%)
5 a321fdpg The Commonwealth of Dominica: Public Sector Debt to GDP (%)
6  a243fdp             Dominican Republic: Public Debt as % of GDP (%)

Retrieving Data for a specific period

What if we wanted to retrieve data only for a specific period?

Let’s retrieve a higher frequency indicator, Argentina’s and Mexico’s currency in circulation, and specify the period and format we want to retrieve.

#Specifying the series codes and respective database
currency <- haver.data(codes=c("n213fmtc","c273fmce"), 
                       database = "EMERGELA", freq="q", 
                       start=as.Date("2010-01-01", format="%Y-%m-%d"))
head(currency)
         n213fmtc c273fmce
2010-Q1  94302.10 597193.9
2010-Q2  95156.17 577815.5
2010-Q3 104835.63 588091.8
2010-Q4 114378.90 693423.1
2011-Q1 127734.27 634711.8
2011-Q2 132877.80 635323.3

In the start argument, the as.Date() function turns the string "2010-01-01" into a date that R can understand. The format part tells R how the date is written (in this case, the order is year-month-day). This helps R know exactly what date to use when pulling the data, as we can see in the top portion visualized.

For clarity, we should rename the columns with the names of our series codes.

#Renaming series codes to descriptors
colnames(currency)[colnames(currency) == "n213fmtc"] <- "Arg_Curr_in_Circ"

colnames(currency)[colnames(currency) == "c273fmce"] <- "Mex_Curr_in_Circ"

Lets quickly check again that the beginning of the retrieval is for the correct period and that the series have been renamed correctly.

head(currency)
        Arg_Curr_in_Circ Mex_Curr_in_Circ
2010-Q1         94302.10         597193.9
2010-Q2         95156.17         577815.5
2010-Q3        104835.63         588091.8
2010-Q4        114378.90         693423.1
2011-Q1        127734.27         634711.8
2011-Q2        132877.80         635323.3

Data Formats and Aggregations Methods

Setting up the Data for Analysis

When downloading data from Haver Analytics, you can specify the format in which you retrieve the data (e.g., zoo, xts) and control how the data is aggregated (e.g., end of period, period average). This flexibility is helpful when working with different time series structures and performing specific analyses.

Retrieving Data in Different Formats

Haver allows you to download data in formats like a plain data frame, or zoo or xts, which are well-suited for time series analysis.

Downloading Data as a Plain Data Frame

The as.data.frame function allows you to retrieve data in a basic data frame format, without any time series-specific structure. This can be helpful for tasks where time series functionality isn’t necessary.

# Downloading data as a plain data frame
lr_us_df <- as.data.frame(haver.data(codes = "LR", database = "USECON", freq = "m"))

# Display the first few rows of the data frame
head(lr_us_df)
          lr
1948-Jan 3.4
1948-Feb 3.8
1948-Mar 4.0
1948-Apr 3.9
1948-May 3.5
1948-Jun 3.6

Downloading Data as a zoo Object

The haver.as.zoo() function converts Haver data into a zoo object, which is a common format for time series data. You would choose zoo for general time series tasks, especially when dealing with irregular or missing data. This is how you would perform this conversion.

#Load the xts and zoo library 
library(xts) 
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
library(zoo)  

# Convert the Haver data to a zoo object 
data_zoo <- haver.as.zoo(haver.data(codes = "LR", database = "USECON", freq = "m"))  

# Display the first few rows of the zoo object 
head(data_zoo)
            lr
1948-01-31 3.4
1948-02-29 3.8
1948-03-31 4.0
1948-04-30 3.9
1948-05-31 3.5
1948-06-30 3.6

Downloading Data as an xts Object

Xts is preferred for financial or economic data that require advanced time series handling, particularly with regular intervals like daily or monthly data. If you prefer working with the xts format, which extends the functionality of zoo, you can convert the data as follows:

# Convert the zoo object to an xts object 
data_xts <- as.xts(data_zoo)  
# Display the first few rows of the xts object 
head(data_xts) 
            lr
1948-01-31 3.4
1948-02-29 3.8
1948-03-31 4.0
1948-04-30 3.9
1948-05-31 3.5
1948-06-30 3.6

We can also use the xts function to directly change a zoo object into an xts object, like in the example below using our previously downloaded currency data.

#Convert Haver data  
currency <- as.xts(haver.as.zoo(currency)) 

Lastly, sometimes, there are some NA values. This is the command you can use to make sure those are cleaned out.

# Remove NA values 
currency <- na.omit(currency)

Specifying Aggregation Methods

When downloading data from Haver, you may need to aggregate the values differently depending on your analysis requirements. For example, you may want to retrieve data based on the end of the period, period average, or perform more complex aggregations like summing monthly data into quarterly values.

Haver allows you to control these behaviors using the aggmode parameter. The key modes are:

  • Strict Mode: Aggregate only if all data points for the period are available.

  • Relaxed Mode: Aggregate if at least one data point for the period is available.

  • Force Mode: Always aggregate, even if data points are missing.

Downloading Data Using End of Period Values

The end-of-period aggregation is useful when you’re interested in the final value of a time period, such as the last day of a month or the last quarter of the year. This can be applied when you need to understand the status of a variable at the period’s conclusion.

To get the end-of-period values, use aggmode = "strict". This ensures that Haver will only aggregate if all the data points for the period are available, ensuring data integrity.

Let’s look at this in the case of the U.S. Federal Funds rate.

# Retrieve data with end-of-period aggregation for U.S. Federal Funds Rate
ffr_end_period <- haver.data(codes = "FFED", database = "USECON", freq = "q", aggmode = "strict")

# Display the first few rows
tail(ffr_end_period)
            ffed
2024-Q1 5.330000
2024-Q2 5.330000
2024-Q3 5.263333
2024-Q4 4.650000
2025-Q1 4.330000
2025-Q2       NA

Downloading Data Using Period Averages

For situations where you’re interested in the average value over a period (e.g., monthly data averaged over a quarter), you can use period averages. This is helpful when you’re looking to smooth out volatility or report aggregate trends across a time period.

To download period average data, use aggmode = "relaxed". This will aggregate the data as long as at least one data point for the period is available. Let’s look at how this would work for U.S. consumer price index (inflation).

# Retrieve data with period average aggregation for U.S. CPI
cpi_us_avg <- haver.data(codes = "PCUN", database = "USECON", freq = "q", aggmode = "relaxed")

# Display the first few rows
tail(cpi_us_avg)
            pcun
2024-Q1 310.3583
2024-Q2 313.9307
2024-Q3 314.8790
2024-Q4 315.5873
2025-Q1 318.8507
2025-Q2       NA

Downloading Data Using Forced Aggregation (Sum)

In some cases, you may need to sum data over a period, such as when summing monthly GDP to calculate quarterly GDP. This is particularly useful for series where the sum over a period provides more insight than the average or end value.

To force sum aggregation, use aggmode = "force". This ensures that data is aggregated even if some data points for the period are missing.

# Retrieve data with forced sum aggregation (e.g., summing monthly data to quarterly GDP)
gdp_sum <- haver.data(codes = "GDP", database = "USECON", freq = "q", aggmode = "force")

# Display the first few rows
head(gdp_sum)
          gdp
1947-Q1 243.2
1947-Q2 246.0
1947-Q3 249.6
1947-Q4 259.7
1948-Q1 265.7
1948-Q2 272.6

Example: U.S. Unemployment Rate

Now that we have learned to retrieve data and clean it, let’s retrieve U.S. unemployment rate from Haver and visualize it in a simple line graph.

Retrieving U.S. Unemployment Rate

We’ll use the Haver data function to retrieve U.S. unemployment from the “USECON” database. The series code for this data is “LR”.

# Specifying the series code for U.S. CPI percent change from the USECON database
lr_us <- haver.data(codes = "LR", database = "USECON", freq = "m")

# Display the first few rows of the retrieved data
head(lr_us)
          lr
1948-Jan 3.4
1948-Feb 3.8
1948-Mar 4.0
1948-Apr 3.9
1948-May 3.5
1948-Jun 3.6

Converting Data to a Time Series (zoo)

We must convert the retrieved data into a zoo object, like was previously shown, which is well-suited for time series analysis.

# Load the necessary libraries for time series data
library(xts)
library(zoo)

# Convert the Haver data into a zoo object for time series handling
lr_us_zoo <- haver.as.zoo(lr_us)

# Display the first few rows of the converted zoo object
head(lr_us_zoo)
            lr
1948-01-31 3.4
1948-02-29 3.8
1948-03-31 4.0
1948-04-30 3.9
1948-05-31 3.5
1948-06-30 3.6

Plotting U.S. Unemployment Rate

We can now directly plot the zoo object using the autoplot function from ggplot2. This is a function that is designed to work with zoo objects directly, simplifying the plotting process.

# Load the ggplot2 and zoo autoplot function
library(ggplot2)

# Directly plot the zoo object using autoplot.zoo
autoplot(lr_us_zoo) +
  labs(title = "U.S. Unemployment Rate (Monthly)",
       x = "Time",
       y = "Unemployment Rate (%)") +
  theme_minimal()

In this tutorial, we explored how to effectively utilize Haver Analytics data in R, from data retrieval to formatting, aggregation, and visualization.